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Abstract 


A new approach to change point detection based on smoothing and multiple testing is pre¬ 
sented for long data sequences modeled as piecewise constant functions plus stationary ergodic 
Gaussian noise. As an a ppli cation of the STEM algorithm for peak detection developed in 


Schwartzman et al. 


(1201 ih and ICheng & SchwartzmanI (120141) . the method detects change points 


as significant local maxima and minima after smoothing and differentiating the observed se¬ 
quence. The algorithm, combined with the Benjamini-Hochberg procedure for thresholding p- 
values, provides asymptotic strong control of the False Discovery Rate (FDR) and power con¬ 
sistency, as the length of the sequence and the size of the jumps get large. Simulations show 
that FDR levels are maintained in non-asymptotic conditions and guide the choice of smoothing 
bandwidth with respect to the desired location tolerance. The methods are illustrated in genomic 
array-CGH data. 


Key Words; False discovery rate; Gaussian; Kernel smoothing; Local maxima; Local minima; Location toler¬ 
ance. 


1 Introduction 


Detecting change points in the mean of an observed signal is a common statistical probl em with appli 


cation s in many research areas such as climatology (IReeves et al.l.l2007), oce anography dKillick et al. 


20101) . finance (IZeileis et al.L l2010h and medical imaging (INam et al. 


20121). It often appears in the 


ana lysis of time series 


3 ut it h a s more re c ently b een found in the an a lysis o f genomic sequences , 


see lErdman & Emerson (2008); 


Lai et al. 


j|2005h ; iMuggeo & Adelfiol (1201 ih ; 


Olshen et al. 


(120041) ; 


Tibshirani & Wangl (l2008h ; IWang et all (l2005h and the references therein. Given the large amounts of 
data present in modern applications, it is of interest to design a change point detection method that 
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can operate over long sequences where the number and location of change points are unknown, and 
in such a way that the overall detection error rate is controlled. 

Many differe nt approaches have been proposed to find and estimate change p oints, such as kernel- 
based methods (IGiibelsI. l2003t) . Bayesian me t hods (|Barry_&Hartigan, 


Olshen et al. 

2004; 

Wang et al. 

2005: 


1993 


Erdman & Emersonl 


Muggeo & Adelfio 


2005 


200 


I 


2011k 


Huang et al.L 


Tibshirani & Wang. 

20081). including the PEET method ( 

Jackson et al.. 

2005; 

Killick et al. 


2012lk However, the approach proposed here is unique in the following two ways. 


Eirst, the noise is assumed to be a stationary Gaussian process, allowing the error terms to be 
correlated. This is an important departure from the standard assumption of white noise in most 
of the change-point literature. In fact, applied statisticians desiring to use change-point methods 
have sometimes abandone d this option in fav or of other techniques simply because the white noise 
assumption does not hold Jnung et al. . 2013 1. This paper shows that change-point methods can be 


devised for correlated noise, expanding the domain of their applicability. 

Second, we use the theory of Gaussian processes to compute p-values for all candidate change 
points, so that significant change points can be selected at a desired significance level. Eor con¬ 
creteness, we adopt the Benjamini-Hochberg multiple testing procedure, enabling control of the false 
discovery rate (EDR) of detected change points when the data sequence is long and the number and 
location of change points are unknown. To our knowledge, this is the first article proposing a multiple 
testing method for controlling EDR of detected change points. 

In this paper, we consider a signal-plus-noise model where the true signal is a piecewise constant 
function and the cha nge points are defined as t he p oints of discontinuity. Inspired by the method 


for detecting peaks in 


Schwartzmanetal 


(1201 ih and ICheng & SchwartzmanI (120141) . we modify the 


STEM algorithm therein to detect change points. The central idea is the observation that the true 
signal has zero derivative everywhere except at the change points, where the derivative is infinite. 
Thus, in the presence of noise and under temporal or spatial sampling, change points can be seen 
as positive or negative peaks in the derivative of the smoothed signal. Note that because of the 
time sampling, derivatives cannot be observed directly and can only be estimated. The focus on the 
derivative of the smoothed signal effectively transforms the change point detection problem into a 
peak detection problem. As in the STEM algorithm, the resulting peak detection problem is then 
solved by identifying local maxima and minima of the derivative as candidate peaks and applying a 
multiple testing procedure to the list of candidates. 

The modified STEM algorithm for change point detection consists of the following steps: 


1. Differential kernel smoothing', to transform change points to local maxima or minima. 

2. Candidate peaks', find local maxima and minima of the differentiated smoothed process. 

3. P-values: computed at each local maximum and minimum under the the null hypothesis of no 
signal in a local neighborhood. 
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Figure 1: Following the notation in 112.11 and Example 13.41 the left panel is the observed 
signal-plus-noise model y{t) containing ten true change points with varying and noise 
z{t) given by (13.51 l with cr = 1 and v = 2. The right panel illustrates the STEM algorithm. 
The blue curve is y'^ (f), obtained with a Gaussian smoothing kernel with standard deviation 
7 = 6. Local maxima (green solid dots) and local minima (red solid dots) are declared as 
significant (marked with solid triangles) at EDR level a = 0.1 if their heights are beyond 
the dotted line thresholds. The cyan and pink bars indicate the location tolerance intervals 
{vi — b,Vi + b) with 6 = 5 for increasing and decreasing change points respectively. At 
this tolerance, there are nine true discoveries and one false discovery. 


4. Multiple testing: apply a multiple testing procedure to the set of local maxima and minima; de¬ 
clare as detected change points those local maxima and minima whose p-values are significant. 

The algorithm is illustrated by a toy example in Figure [T] 

The modified STEM algo rithm above differs from the ones in 


Schwartzman et al 


(1201 ll) and 

Cheng & SchwartzmanI (120141) in that peaks are sought in the derivative of the smoothed signal 


rather than the smoothed signal itself, and that both positive and negative peaks are considered. 
In addition, an important consideration for the proper definition of error in change point dete ction 
is that, as opposed to the pe ak detection problems considered in ISchwartzman et al.l (120111) and 
Cheng & SchwartzmanI (120141) where signal peaks had compact support, a true single change point at 


t = V has Lebesgue measure zero. Thus in the presence of noise, it can never be detected exactly at 
t = V. Therefore we introduce a location tolerance b that defines fhe precision wifhin which a change 
poinf should be defecfed. Specifically, given b, a defecfed change poinf is regarded as a frue discovery 
if if falls in fhe inferval {v — b,v + b). Conversely, if a significanf change poinf is found more fhan a 
disfance b from any frue change poinf, if is considered a false discovery. The quantify b is nof used in 
fhe STEM algorifhm ifself buf is needed for proper error definition. 

Under fhis convention, if is shown here fhaf fhe modified STEM algorifhm exhibifs asymptotic 
EDR confrol and power consisfency as fhe lengfh of fhe sequence and fhe size of f he jumps af fhe 
change poinfs increase. These asympfofic condifions are similar fo fhose considered in 


Schwartzman ef al. 
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(1201 ih andICheng & SchwartzmanI (12014) and, in fact, the proofs are easily extended from those in 


Cheng & Schwartzman J2014 ). 


Simulations for varying levels of smoothing bandwidth 7 , location tolerance b and jump size a 
are used to study the behavior of the algorithm under non-asymptotic conditions. The simulation 
results help guide the choice of smoothing bandwidth with respect to the desired location tolerance. 
In general, power increases with bandwidth to a limit dictated by the distance between the change 
points, so admitting a higher tolerance generally allows a higher bandwidth and higher power. 

The methods are illustrated in a geno mic sequence of array-CGH data in a breast-cancer tissue 


sample (ILoo et ah 


2005 


Hsu et al. 


2005h . The goal of the analysis is to find genomic segments with 


copy-number alterations. These are found by detecting change points in the copy number genomic 
sequence. 


2 The multiple testing scheme 

2.1 The model 

We consider a continuous time model, although the algorithm is designed for data discretely sampled 
in time. Consider the signal-plus-noise model 

y{t) = fi{t) + z{t), t€R, ( 2 . 1 ) 

where the signal y{t) is a piecewise constant function of the form 

00 

y{t) = '^ajhj{t), ajeM\{0}, 

f=o 

with hj{t) = l{t > Vj) for Vj G M. We are interested in finding the change points Vj. For the 
asymptotic analysis, we assume 

a = inf \aj\ > 0 and v = inf \vj — Vj-i\ > 0, 
j j 

so that the change points do not become arbitrarily small in size nor arbitrarily close to each other. 

Let Wj{t) = w{t/j)/j, where 7 > 0 is the bandwidth parameter and w{t) > 0 is a unimodal 
symmetric kernel with compact connected support [— c, c] and unit action. Convolving the process 
(12.11) with the kernel w^{t) results in the smoothed random process 

y^{t) = w^{t) * y{t) = / w^{t - s)y{s)ds = p^{t) + z^{t), ( 2 . 2 ) 

Jm. 

where the smoothed signal and smoothed noise are defined respectively as 

00 

P'yit) = w^{t) * p,{t) = ''^^ajhj^^{t) and Zj{t) = w^{t) * z{t), (2.3) 

j=0 
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and where the smoothed change point takes the form 

= w^{t) * hj{t). (2.4) 

The smoothed noise z^{t) defined by (12.31) is assumed to be a zero-mean four-times differentiable 
stationary ergodic Gaussian process. 


2.2 Change point detection as peak detection of the derivative 

Consider now the derivative of the smoothed observed process (12.21) 

y' {t) = w' (t) * y{t) = [ w' {t - s)y{s) ds = y! {t) + z' (t), 

where the derivatives of the smoothed signal and smoothed noise are respectively 

OO 

= w’^{t) * y{t) = ^ and zl^{t) = w'^{t) * z{t). 

j=0 

A key observation from (12.41) is that 

hj .y{t)= / w'^{t — s)hj{s) ds = / w'^{s)hj{t — s) ds 
= / w'^{s)l{t — s > Vj) ds = w^{t — Vj). 

J M 

Thus (12.51) represents a signal-plus-noise model where the smoothed signal 

OO OO 

/^7(i) = = Y - Vj) 

j=0 j=0 


(2.5) 


( 2 . 6 ) 


(2.7) 


is a sequence of unimodal peaks with the same shape as that of Wj and located at locations Vj. The 
problem of finding change poinfs in y^ (t) is fhus reduced fo finding (positive or negative) peaks in 

For simplicify, we assume fhaf fhe compacf supporfs Sj^.y of fhe smoofhed peak shape fi' ^ (t) = 
w^{t — Vj) do nof overlap, allhough Ibis is nof crucial in pracfice. 

2.3 The STEM algorithm for change point detection 

Suppose we observe y{t) wilh J jumps defined by (12.11) in fhe line of lengfh L centered af fhe 
origin, denoted by UjL) = ( —L/2,L/2). The following i s a version of fhe STEM algorifhm of 


Schwartzmanetal 


(1201 ih and ICheng & Schwartzmanl (120141) for detecfing change poinfs. 


Algorithm 2.1 (STEM algorithm for change point detection) 

1. Differenfial kernel smoofhing.- Obtain the process (12.51) by convolution of y{t) with the kernel 
derivative w!y{t). 
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2. Candidate peaks; Find the set of local maxima and minima of y'^{t) in U{L), denoted by 

= T+ U T”, where 

f+ = {t G U{L) : y''it) = 0, y!^\t) < o} , 

T- = {t€ U{L) : y'^it) = 0, > o} . 

3. P-values; For each t G T^, compute the p-value p.y{t)for testing the (conditional) hypotheses 

FLof){p'(s) = Q for all s^(t — h,t + h)} vs. 

'Hyl(t) : {//'(s) > 0 for some s G (t — b,t + b)}] 

and for each t G Tf, compute the p-value p-y(t)for testing the (conditional) hypotheses 

FLof) : {p'(s) = 0 for all sG{t — h,t + h)} vs. 

FLA{t) '■ < 0 for some s G (t — b,t + b)}, 

where b> Q is an appropriate location tolerance. 

4. Multiple testing; Let m-y = ^{t G T-/} be the number of tested hypotheses. Apply a multiple 
testing procedure on the set ofm..y p-values {p-^{t), t G T^}, and declare significant all local 
extrema whose p-values are smaller than the significance threshold. 


2.4 P-values 


Given the observed heights yiy{t) at the local maxima or minima t G = T+ U T.^ , p-values in step 
(3) of Algorithm 12. H are computed as 


p-f{t) 


F-f{y'^{t)), t G r+, 
P'li-y'yit)), t G ff, 


( 2 . 8 ) 


where Fy{u) denotes the right tail probability of z'^{t) at the local maximum t G T+, evaluated under 
the null model p'{s) = 0, Vs G (t — b,t-\- b), that is, 


F-f{u) = P > ri| f is a local maximum of z'^{t)) . (2.9) 

The second line in (12.81) is obtained by noting that, by (12.91) . 

P {ziy(t) < u| f is a local minimum of ziy{t)) 

= P (—z^(f) > —u\ f is a local maximum of —z'^{t)) = F-f{—u), 

since —z'^{t) and z'^{t) have the same distribution. 
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The d istribution (12.9b has a closedT orm expression, which can be obtained as in 


Schwartzman et al 


(1201 ih or ICramer & Leadbetten (1 19671) . More specifically, the distribution (12.91) is given by 


Fy{u) = 1 - $ 



27rA| 
- 
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( 2 . 10 ) 


where a’^'^ = \ 4 ^y = Var(z"(f)), = Var(z^^^(f)), A = - A|,y, and 

(/>(x), <l>(a:) are the standard normal density and cumulative distribution function, respectively. The 
quantities X^^^ and A 6,7 depend on the kernel w^{t) and the autocorrelation function of the 
original noise process z{t). Explicit expressions may be obtained, for instance, for the Gaussian 
autocorrelation model in Example l3.4l below. which we use later in the simulations. 


2.5 Error definitions 


Assuming the model of 112.11 define fhe signal region — b,Vj + 5) and null region 

E>q = U(L) \ E>\. Eor u > 0, lef T-y{u) = (u) U T~(u), where 

= ^tGU(L) : y'^{t) > u, y”{t) = 0, y®(f) < o| , 

= {t^U{L) ■. y'^{t) < -u, y"{t) = 0, y!^^\t) > o| , 

indicafing fhaf (n) and T~ (u) are respectively fhe sef of local maxima of y!^ (t) above u and fhe 
sef of local minima of yj(t) below —u. The number of fofally and falsely defecfed change poinfs al 
Ihreshold u are defined respectively as 


( 2 . 11 ) 


E^(u) = #{t e r+(n)} + #{f G r-(u)}, 

V^{u- h) = #{f G T+(u) n + #{f G f-{u) n s^}. 

Bolh are defined as zero if T^{u) is empty. The EDR al Ihr'eshold u is defined as fhe expecled 
proportion of falsely defecfed jumps 

V^{u-h) 


FDR 7 (m; b) =E 


R^{u) V 1 


( 2 . 12 ) 


Note lhal when 7 and u are fi xed, \X(u\ b) and hence FDR^ fu; b) are decreasing in b. 

Following fhe nofafion in ICheng & Schwartzmanl (120141) . define fhe smoothed signal region 
lo be fhe support of and smoothed null region 89,7 = U{L) \ 81 ^ 7 . We call fhe difference 
befween fhe expanded signal supporf due fo smoolhing and fhe frue signal supporf fhe transition 
region T 7 = 81^7 \ Sj = §0 \ ^ 0 , 7 - 


2.6 Power 

Denote by and I" fhe collecfions of indices j corresponding fo increasing and decreasing change 
poinfs Vj, respectively. We define fhe power of Algorifhm l2.1l as fhe expected fraclion of frue discov- 
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ered change points 


Power^(tt; b) 


1 

J 


J 

Powerj_^(u; 6) = E 

i=i 


1 

J 


^ 1 (f+ (u) n {Vj - b, Vj + b) ^ 
ie/+ 


+ ^ 1 (Ey (n) n {vj - b, Vj + 6) / 0 H , 

jGl- /. 


(2.13) 


where Poweij h) is the probability of detecting jump Vj within a distance b, 

Vawevj^^{u]b) = 

The indicator function in (12.131 ) ensures that only one significant local extremum is counted within a 
distance 6 of a change point, so power is not inflated. Note that when 7 and u are fixed, Power b) 
and Power b) are increasing in b. 


p f r+ (m) n {vj - b, Vj + 6) / 0 V if j e /+, 

P [f- (u) n {vj - b, Vj + 6) / 0j , if j e 


3 Asymptotic error control and power consistency 


Suppose the BH procedure is applied in step 4 of Algorithm 12. II as follows. For a fixed a G (0,1), let 
k be the largest index for which the zth smallest p-value is less than iajrh^. Then the null hypothesis 
'Hoit) att £ is rejected if 


, , ka 
p^{t) < — 




where ka/rh^ is defined as 1 if rhry = 
as 


> %H = 7 -'(fe) ifte7+, 

y'-,(t)<-UBH = -Fp[^) iftef-, 

0. Since ubh is random, we define FDR in such BH procedure 


FDRBH,7(i)) = E 


EyluBH; b) 
R'yiuBu) V 1 


where Ity{-) and Vy{-;b) are defined in ( 12.1 111 and the expectation is taken over all possible realiza¬ 
tions of the random threshold mbh- We will make use of the following conditions: 


(Cl) The assumptions of ^2.1l hold. 

(C2) L —^ 00 and a = infj |ajj —)► 00 , such that (log L)/a^ — ^ 0, J/L = Ai + 0{a~^ + 
with Ai > 0 . 


Let E[mo, 7 (C^(l))] and E[mo, 7 (C^(l), u)] be the expected number of local maxima and local 
maxima above level u of z!y{t) on u nit interval U(l) = (—1/ 2,1/2), respectively. In particular, we 
have the following explicit formula (ISchwartzman et alll201lh 


E[mo,7(t^(l))] = 


1 




27r \ A, 


4,7 


(3.2) 














Theorem 3.1 Let conditions (Cl) and (C2) hold. 

(i) Suppose Alsorithm \2.1\ is applied with a fixed threshold u > 0. Then 

FDR,(„; (,) < (C/(l) „)](! - ^ ^ 

^ 2E[mo,7(C/(l),?x)](l - 2c7^i) + 

(ii) Suppose Alsorithm \2.1\ is applied with the random threshold ubh (13.111 . Then 

2E[mo,^(C/(l))](l - 2c^Ai) 


FDRBH,7(ft) ^ OL 


2 E[mo, 7 (? 7 (l))](l - 207^1) + A] 


+ 0(a-i + L-^/^). 


Proof Since Wj(t) has compact support [— 07 , 07 ], by (12.61) . the support of p!y(t) in (12.71) is 
composed of the support segments [vj — cj^Vj + 07 ] of h'j^{f). By condition (C2), |Si^.y|/L = 
207^1 + 0(a~‘^ + which implies [§ 0 , 7 !/-^ = 1 — 2 c 77 li + 0(a~‘^ + 

Notice that, on the null region § 0,7 > the expected number of local extrema, including both local 
maxima and minima, equals 2|Sn-^lE [ ?fin.^( f7fl))1. On the other hand, similarly to the proof of 


Theorem 3 in 


Cheng & SchwartzmanI (l2014h . the expected number of local extrema on the signal 
region is asymptotically equivalent to J. This is because, for each j S and 6 > 0, as a —)■ 00 , 
asymptotically, there is no local maximum of 2 /!j,(f) in (vj — 07 , vj — 6 ) U (vj + b, vj + cj), and there 
is only one local maximum of y'.y{t) in (vj — b, Vj + b). 


The result then follows from similar arguments for proving Theorem 3 in ICheng & Schwartzman 


(12014t) with N = 1, A 2 ,y = 2cjAi, z.y(t) replaced by z!y(t) and E[ 7 tio, 7 ( 17 ( 1 ))] replaced by 


2E[mo,7(t^(l))]- 


□ 


Lemma 3.2 Let conditions (Cl) and (C2) hold. As \ aj \ 
(12.141) can be approximated by 


00 , the power for peak j and fixed u 


Powerj^.y(u; 6) = <1> 


aj|m..),( 0 ) — u 


cr' 


(l + OQajn). 


(3.3) 


Proof By (12.61) . h'-^(vj) = is the maxir num of h'^ ^,(t) over t £ R. Th e result then follows 

from similar arguments for proving Lemma 4 in ICheng & SchwartzmanI (l2014t) with 2 : 7 ( 1 ) replaced 
by z'Jt). □ 


By similar arguments in ICheng & SchwartzmanI (l2014t) (see equation (20) therein), one can show that 
the random threshold ubh converges asymptotically to the deterministic threshold 


= F. 


-1 


aAi 


Ai + 2E[mo, 7 ( 17 ( 1 ))](1 - 2c-fAi)(l - a) 


(3.4) 
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where E[mo, 7 (C/(l))] is given by (I3.2I) . Since mbh is random, similarly to the definition of FDRbh, 7 (&)> 
we define power in fhe BH procedure as 


PowerBH,7(^) = E 


j ( X] ^ (^bh) n {vj - b, Vj + 6) / 0^ 

\jei+ 

+ ^ 1 (^7 (ubh) n {vj - b, Vj + 6) 7 0^ 
i&i- 


Theorem 3.3 Let conditions (Cl) and (C2) hold. 

(i) Suppose Alsorithm \2. l\ is applied with a fixed threshold u > 0. Then 

Power^(u; b) = 1 — 0(o“^). 

(ii) Suppose Aleorithm \2.1\ is applied with the random threshold Ubh (13. 11 1. Then 

PowerBH,7(&) = 1 — 0{a~^ + 


Proof The desired resulfs follow from similar argumenfs for showing Theorem 5 in ICheng & Schwarfzman 
(l2014ll . □ 


Example 3.4 [Gaussian autocorrelation model] Let the noise z{t) in (12.11) be constructed as 



a,u > 0, 


(3.5) 


where f is the standard Gaussian density, dB{s) is Gaussian white noise and > 0 (z{t) is regarded 
by convention as Gaussian white noise when u = 0). Convolving with a Gaussian kernel w^{t) = 
(l/ 7 ) 0 (f/ 7 ) with 7 > 0 as in (12.31 ) produces a zero-mean infinitely differentiable stationary ergodic 
Gaussian field z^ (t) such that 

z(^{t) = * zf) = a dB(s), i 

/{Ay/TT^^), A 4 ^.^ = 3cr^/(8v^'^^) and A 6,7 = ‘S2>a '^/We have 


SNRj-= 


a^w. 


.( 0 ) 




V2\aj\ ( 7 ^ + 

fj7ri/4 7 


(3.6) 


As a function of 7 , the SNR has a local minimum at 7 * = y/2v and is strictly increasing for large 
7 . In particular, when = 0, it is strictly increasing in 7 . Thus we generally expect the detection 
power to increase with 7 for 7 > y/2v. This will be confirmed in the simulations below. Note that 
for z/ > 0, the SNR is unbounded as 7 —)• 0, however in practice 7 cannot be too small: if the support 
of rz;.y becomes smaller than the sampling interval, then the derivative cannot be estimated. 
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a = 1 


a = 2 


a = 3 



4 Simulation studies 

Simulations were used to evaluate the performance and limitations of the STEM algorithm for signals 
/r(f) = a\t/d\, where t = I,..., L, L = 12000, d G {100,15} and signal strength a G {1,2,3}. 
Under this setting, the true change points are Vj = jd for j = 1,..., L/d — 1. The noise is generated 
as the Gaussian process constructed in (13.51) with a = 1 and v = 2. The smoothing kernels are 
= (l/ 7 )(/)(f/ 7 )l(f G [— 47 , 47 ]) for varying 7 . The BH procedure was applied at FDR level 
a = 0.05. Results were averaged over 10,000 replications to simulate the expectations. 

Figure |2] shows the realized FDR and detection power for a change point separation of d = 100. 
A special color map was used for FDR to emphasize the control of FDR (FDR values less than 
the nominal level 0.05 appear in dark blue). We see that for every fixed bandwidth 7 , increasing 
the location tolerance h allows for detections to be counted as true farther away from the true change 
points, thereby decreasing FDR and increasing power. On the other hand, as expected from Theorems 
B.ll and lT^ as the strength of the signal a increases, FDR is eventually controlled below the nominal 
level and the power tends to 1 for every combination of parameters 7 and b. 

For a fixed tolerance b, increasing fhe bandwidfh 7 increases FDR by moving some frue change 
poinfs beyond b and producing arlificial errors. This also resulfs in some loss of power, which can 
be seen in Figure |2] especially when b is small and 7 is large. In addifion, for larger b and smaller 7 , 
fhe power is seen fo firsf decrease quickly and fhen increase again as 7 increases. This phenomenon 
coincides wifh fhe behavior of fhe SNR (13.61) derived in Example l3.4[ predicting fhe power to decrease 
for 7 < and increase for 7 > y/2iy. 

Figure [ 3 ] illusfrafes fhis phenomenon more clearly for fixed b = 8. The realized FDR curves have 
fhe same shape as fhe “fheorefical power” curves, obfained by plugging fhe asymptotic fhreshold Ugjj 
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1 



Smoothing bandwidth 7 

Figure 3; The solid curves and dotted curves (in both cases, blue for a = 1, green for 
a = 2 and red for a = 3) are respectively “theoretical powers” and realized powers from 
Figure|2]with b = 8. The pink dotted vertical line is 7 = v/2i/ as shown in Example [34l 


(13.41) into the approximate power (13.31) . Both curves get closer to 1 as the signal strength a increases. 

The separation of d = 100 in Figure |2] is large enough that bandwidths up to 7 = 10 do not 
produce any interference between neighboring change points. To investigate the effect of neighboring 
interference, Figure |4] shows the realized FDR and detection power for a change point separation of 
d = 15. It can be seen that increasing 7 beyond 5 will produce interference and contamination error, 
thereby decreasing power. Still, for any fixed 7 and b, FDR will eventually be controlled and power 
will increase if the signal strength is large enough. 


5 Data example 

5.1 Data description 


Array-based comparative genomic hybridization (array-CGH) is a high-throughput high-resolution 
technique used to evaluate changes in the number of copies of alleles at thousands of genomic loci 
simultaneously. The output is often called Copy Number Variation (CNV) data. Changes in copy 
number are represented by segments whose mean is displaced with respect to the background. To 
detect these changes, it is costumary to search for change points along the genome. 


Hsu el al. 

(2005 

) and 

Loo ef al. 

(2005) 


(I2005h . This sample is one of 37 formalin-fixed breasf cancer fumors 
in fhaf dafasef and if was chosen for ifs visual appeal in fhe illusfrafion of our mefhod. The dafa in 
chromosome 1 of fumor sample #18 consisfs of 968 average copy number reads mapped onfo 968 
unequally spaced locafions along fhe chromosome. For simplicify, fhe dafa was analyzed ignoring 
fhe gaps in fhe genomic locations: Figure |5] (lop leff) shows fhe dafa wifh spacings befween reads 
arlificially sel lo 1. Nole lhal ignoring fhe spacings does nol affecl fhe presence or absence of change 
poinls. 
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Figure 4; Realized FDR and power of the BH procedure for d = 15. 


5.2 Data analysis 

To analyze the data, the STEM algorithm was applied with a truncated Gaussian smoothing kernel 
w^{t) = € [— 47 , 47 ]) with 7 = 10. The bandwidth was chosen not too large 

in order to avoid interference between neighboring change points. Figure [5] (top right) shows the 
estimated first derivative (12.51) . Figure [5] (bottom left) marks 19 local maxima (green) and 19 local 
minima (red). 

P-values corresponding to local maxima and minima were computed according to (12.81) using 
the distribution (12.101) . The required parameters = Yav{z'^{t)), = Yav{z”{t)), Ae,^ = 

Yai{z\ (f)) were estimated empirically from the estimated first, second and third derivatives over the 
observed data sequence. However, the empirical variances were computed using truncated averages 
instead of regular averages in order to avoid bias from the extreme derivatives at the change points 
without assuming their presence or location in advance. The BH algorithm was applied to the 38 
p-values FDR level 0.2, yielding a p-value significance fhr'eshold of 4.42 x 10“^. The corresponding 
absolufe heighf fhreshold of 0.089 is marked as dashed lines in Figure [5] (bottom leff). The significanf 
peaks are ploffed on fhe original dafa in Figure [5] (bottom righf) wifh a locafion tolerance of 6 = 2 for 
visual reference. 


6 Discussion 

In fhis paper, we combined bofh local maxima and minima as candidate peaks, and fhen applied a 
mulfiple fesfing procedure to find a uniform fhreshold (in absolufe value) for defecting all change 
poinfs. This approach is sensible when fhe disfribufions (number and heighf) of frue increasing and 
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Figure 5; Data example. Top left; Observed data. Top right; estimated first derivative. 
Bottom left; local maxima (upward triangles), local minima (downward triangles) of the 
estimated first derivative, and significance height threshold (black dashed). Bottom right; 
The detected positive (green) and negative (red) change points. 


decreasing change points are about the same. Alternatively, different thresholds for detecting increas¬ 
ing and decreasing change points could be found by applying separate multiple testing procedures to 
the sets of candidate local maxima and local minima. While we applied the BH algorithm to control 
FDR, in principle other multiple testing procedures may be used to control other error rates. 

A natural and important question is how to choose the smoothing bandwidth 7 . From Example 
13 .4 1 and Figures |2] and IH we see that either a very small 7 or a relatively large 7 is preferred in order 
to increase power, but only to the extent that the smoothed signal supports h'- ^ (t) have little overlap 
and that detected change points are not displaced by more than the desired tolerance b (recall that 
the value of b is not used in the STEM algorithm itself, but it may be determined by the needs of 
the specific scientific application). Considering the Gaussian kernel to have an effective support of 
±C 7 , a good value of 7 may be about min(6, d/c), where d is the separation between change points. 
Since the location of the change points is unknown, a more precise optimization of 7 may require an 
iterative procedure. Moreover, if some change points are close together and others are far apart, an 
adpative bandwidth may be preferable. We leave these as problems for future research. 


14 





















































References 


Adler, R. J. & Taylor, J. E. (2007). Random Fields and Geometry. Springer, New York. 

Barry, D. & Hartigan, J. A. (1993). A Bayesian analysis for change point problems. J. Am. Statist. Assoc. 
88 , 309-319. 

Benjamini, Y. & HocHBERG, Y. (1995). Controlling the false discovery rate; a practical and powerful 
approach to multiple testing. J. R. Statist. Soc. B 57, 289-300. 

Cheng, D. & Schwartzman, A. (2014a). Distribution of the height of local maxima of Gaussian random 
fields. Extremes, to appear. arXiv: 1307.5863 

Cheng, D. & Schwartzman, A. (2014b). Multiple testing of local maxima for detection of peaks in random 
fields. arXiv; 1405.1400 

Cramer, H. & Leadbetter, M. R. (1967). Stationary and Related Stochastic Processes. Wiley, New York. 

Eilers, R H. C. & DE Menezes, R. X. (2005). Quantile smoothing of array CGH data. Bioinformatics 21 , 
1146-1153. 

Erdman, C. & Emerson J. W. (2008). A fast Bayesian change point analysis for the segmentation of 
microarray data. Bioinformatics 24 , 2143-2148. 

Gijbels, I. (2003). Inference for nonsmooth regression curves and surfaces using kernel-based methods, a 
survey collected in “Recent advances and trends in nonparametric statistics ” by Michael G. Akritas and 
Dimitris N. Politis. 

Huang, T., Wu, B., Lizardi, R & Zhao, H. (2005). Detection of DNA copy number alterations using 
penalized least squares regression. Bioinformatics 21 , 3811-3817. 

Hung, Y., Wang, Y., Zarnitsyna, V., Zhu, C. & Wu C. E. J. (2013). Hidden Markov models with 
applications in cell adhesion experiments. J. Am. Statist. Assoc. 108 , 1469-1479. 

Hsu, L., Self, S. G., Grove, D., Randolph, T., Wang, K., Delrow, J. J., Loo, L. & Rorter, R. 
(2005). Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics 6, 
211-226. 

Jackson, B., Scargle, J. D., Barnes, D., Arabhi, S., Alt, A., Gioumousis, R, Gwin, E., Sang- 
TRAKULCHAROEN, R, Tan, L. & TSAI, T. T. (2005). An Algorithm for Optimal Partitioning of Data on 
an Interval. IEEE, Signal Processing Letters 12 , 105-108. 

Killick, R., Eckley, I. A., Ewans, K. & Jonathan, P. (2010). Detection of changes in variance of 
oceanographic time-series using changepoint analysis. Ocean Engineering 37, 1120-1126. 

Killick, R., Eearnhead, P. & Eckley, I. A. (2012). Optimal detection of changepoints with a linear 
computational cost. J. Am. Statist. Assoc. 107 , 1590-1598. 

Lai, W. R., Johnson, M. D., Kucherlapati, R. & Park, P. J. (2005). Comparative analysis of algorithms 
for identifying amplifications and deletions in array CGH data. Bioinformatics 21 , 3763-3770. 


15 


Loo, L. W. M., Grove, D. I., Neal, C. L., Cousens, L. A., Schubert, E. L., Williams, E. M., 
Holcomb, I. N., Delrow, J. J., Trask, B. J., Hsu, L. & Porter, P. L. (2004). Array-CGH analysis 
of genomic alterations in breast cancer subtypes. Cancer Research 64 , 8541-8549. 

Lanzante, j. R. (1996). Resistant, robust and non-parametric techniques for the analysis of climate data: 
Theory and example, including applications to historical radiosonde station data. International Journal of 
Climatology 16 , 1197-1226. 

MugGEO, V. M. R. & Adeleio, G. (2011). Efficient change point detection for genomic sequences of 
continuous measurements. Bioinformatics 27, 161-166. 

Nam, C. E. H., Aston, J. A. D. & Johansen, A. M. (2012). Quantifying the uncertainty in change points. 
Journal of Time Series Analysis 33, 807-823. 

Olshen, a. B., Venkatraman, E. S., Lucito, R. & WiGLER, M. (2004). Circular binary segmentation 
for the analysis of array-based DNA copy number data. Biostatistics 5 , 557-572. 

Reeves, J., Chen, J., Wang, X. L., Lund, R. & Lu, Q. Q. (2007). A Review and Comparison of Change- 
point Detection Techniques for Climate Data. Journal of Applied Meteorology and Climatology 46 , 900-915. 

Schwartzman, a., Dougherty, R. E. & Taylor, J. E. (2008). Ealse discovery rate analysis of brain 
diffusion direction maps. Ann. Appl. Statist. 2, 153-175. 

Schwartzman, A., Gavrilov, Y. & Adler, R. J. (2011). Multiple testing of local maxima for detection 
of peaks in ID. Ann. Statist. 39, 3290-3319. 

Tibshirani, R. (& Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused 
lasso. Biostatistics 9, 18-29. 

Wang, P., Kim, Y., Pollack, J., Narasimhan, B. & Tibshirani, R. (2005). A method for calling gains 
and losses in array CGH data. Biostatistics 6, 45-58. 

Zeileis, a., Shah, A. & Patnaik, I. (2010). Testing, Monitoring, and Dating Structural Changes in Ex¬ 
change Rate Regimes. Computational Statistics & Data Analysis 54 , 1696-1706. 


16 


